Goto

Collaborating Authors

 agnostic learning


Proper Agnostic Learning of Functions of Halfspaces under Gaussian Marginals

arXiv.org Machine Learning

We study the problem of computationally efficient proper agnostic learning of multidimensional concept classes under the Gaussian distribution. In this setting, given i.i.d. labeled samples from an unknown distribution over $\mathbb{R}^d \times \{\pm 1\}$ whose marginal on $\mathbb{R}^d$ is Gaussian, the goal is to output a hypothesis from a target class $\mathcal{F}$ whose 0-1 loss is within $ฮต$ of that of the best classifier in $\mathcal{F}$. We give the first efficient proper agnostic learning algorithm for arbitrary Boolean functions of $K$ halfspaces under Gaussian marginals. Our algorithm runs in time $d^{O(K^2 \log(1/ฮต)/ฮต^2)} + (K/ฮต)^{O(K^3/ฮต^{2.5})}$. Prior to our work, the only known algorithm for $K \geq 2$ was brute-force search, with run-time exponential in $d$. Moreover, the dependence of our run-time on the dimension $d$ matches that of the best known improper learning algorithm, namely $d^{\widetilde{O}(K^2/ฮต^2)}$. For the special case of a single halfspace ($K=1$), the best previous run-time was $d^{O(1/ฮต^4)} + (1/ฮต)^{O(1/ฮต^6)}$. Our algorithm improves this to $d^{O(1/ฮต^2)} + (1/ฮต)^{O(1/ฮต^{2.5})}$. Once again, the dependence on $d$ matches that of the best known improper algorithm, namely $d^{O(1/ฮต^2)}$. Furthermore, the dependence of our run-time on the dimension $d$ is essentially optimal in the statistical query model.


Swap Agnostic Learning, or Characterizing Omniprediction via Multicalibration

Neural Information Processing Systems

We introduce and study Swap Agnostic Learning. The problem can be phrased as a game between a predictor and an adversary: first, the predictor selects a hypothesis h; then, the adversary plays in response, and for each level set of the predictor {x X: h(x) = v} selects a loss-minimizing hypothesis cv C; the predictor wins if p competes with the adaptive adversary's loss. Despite the strength of the adversary, our main result demonstrates the feasibility Swap Agnostic Learning for any convex loss. Somewhat surprisingly, the result follows by proving an equivalence between Swap Agnostic Learning and swap variants of the recent notions Omniprediction [15] and Multicalibration [20]. Beyond this equivalence, we establish further connections to the literature on Outcome Indistinguishability [6, 14], revealing a unified notion of OI that captures all existing notions of omniprediction and multicalibration.



A Omitted Proofs

Neural Information Processing Systems

Taking = p / gives the desired claim. Claim 2.7, we know that the multicalibration violation for The inequalities follow by Holder's inequality and the assumed bound on the weight of Recall that Cov[ y, z ]= E [ yz ] E [ y ] E [ z ] . Here, we give a high-level overview of the MCBoost algorithm of [ 20 ] and weak agnostic learning. Algorithm 2 MCBoost Parameters: hypothesis class C and > 0 Given: Dataset S sampled from D Initialize: p ( x) 1 / 2 . By Lemma 3.8, we know that In this Appendix, we give a full account of the definitions and results stated in Section 4 .



BeyondPerturbations: LearningGuaranteeswith ArbitraryAdversarialTestExamples

Neural Information Processing Systems

Inparticular,forany function in a classC of bounded VC dimension, we guarantee a low test error rate and a low rejection ratewith respect toP. Our algorithm is efficient given an Empirical Risk Minimizer (ERM) forC.



Agnostic Learning with Multiple Objectives

Neural Information Processing Systems

Most machine learning tasks are inherently multi-objective. This means that the learner has to come up with a model that performs well across a number of base objectives $\cL_{1}, \ldots, \cL_{p}$, as opposed to a single one. Since optimizing with respect to multiple objectives at the same time is often computationally expensive, the base objectives are often combined in an ensemble $\sum_{k=1}^{p}\lambda_{k}\cL_{k}$, thereby reducing the problem to scalar optimization. The mixture weights $\lambda_{k}$ are set to uniform or some other fixed distribution, based on the learner's preferences. We argue that learning with a fixed distribution on the mixture weights runs the risk of overfitting to some individual objectives and significantly harming others, despite performing well on an entire ensemble. Moreover, in reality, the true preferences of a learner across multiple objectives are often unknown or hard to express as a specific distribution. Instead, we propose a new framework of \emph{Agnostic Learning with Multiple Objectives} ($\almo$), where a model is optimized for \emph{any} weights in the mixture of base objectives. We present data-dependent Rademacher complexity guarantees for learning in the $\almo$ framework, which are used to guide a scalable optimization algorithm and the corresponding regularization.


Agnostic Learning of a Single Neuron with Gradient Descent

Neural Information Processing Systems

We consider the problem of learning the best-fitting single neuron as measured by the expected square loss $\E_{(x,y)\sim \mathcal{D}}[(\sigma(w^\top x)-y)^2]$ over some unknown joint distribution $\mathcal{D}$ by using gradient descent to minimize the empirical risk induced by a set of i.i.d.


Smoothed Agnostic Learning of Halfspaces over the Hypercube

arXiv.org Machine Learning

Agnostic learning of Boolean halfspaces is a fundamental problem in computational learning theory, but it is known to be computationally hard even for weak learning. Recent work [CKKMK24] proposed smoothed analysis as a way to bypass such hardness, but existing frameworks rely on additive Gaussian perturbations, making them unsuitable for discrete domains. We introduce a new smoothed agnostic learning framework for Boolean inputs, where perturbations are modeled via random bit flips. This defines a natural discrete analogue of smoothed optimality generalizing the Gaussian case. Under strictly subexponential assumptions on the input distribution, we give an efficient algorithm for learning halfspaces in this model, with runtime and sample complexity approximately n raised to a poly(1/(sigma * epsilon)) factor. Previously, such algorithms were known only with strong structural assumptions for the discrete hypercube, for example, independent coordinates or symmetric distributions. Our result provides the first computationally efficient guarantee for smoothed agnostic learning of halfspaces over the Boolean hypercube, bridging the gap between worst-case intractability and practical learnability in discrete settings.